Fusing automatically extracted semantic annotations

نویسنده

  • Andriy Nikolov
چکیده

One of the necessary preconditions of the Semantic Web initiative is the availability of semantic data. The Web already contains large amounts of information intended for human users. This information is mainly stored as hypertext, which must be semantically annotated to make it accessible for software agents. The amount of information on the Web makes it impossible to solve the annotation task manually. So the usage of automatic information extraction algorithms is essential. These algorithms use various NLP and machine learning techniques to extract information from text (Ciravegna 2003; Cimiano 2005). The information extracted from different sources must then be integrated in a knowledge base, so that it can be queried in a uniform way. This integration process is called knowledge fusion. Semantic annotations extracted automatically will inevitably contain defective aspects, which can cause problems during integration. These defective aspects include the following (based on (Appriou, Ayoun et al. 2001)): 1. Ambiguity. In general, information which can be interpreted in several distinct ways is considered ambiguous. It applies to the case when it is impossible to decide what real-world item the information refers to. For instance, when dealing with geographical information a document describing a country with the name " Korea " does not allow a software agent to judge automatically, whether it refers to North or South Korea. 2. Uncertainty. Uncertainty refers to the case when it is not possible to say definitely whether a particular Boolean statement is true or false. For instance, information can be biased depending on the source it comes from. Also there is a possibility of incorrect extraction by the extraction algorithm. For instance, the NationByNation.com website often contains values (like " Russia's unemployment rate is equal to 1.5% "), which are different from other sources. This information should have lower reliability. 3. Imprecision. Sometimes the content of the statement can be imprecise itself. For example, it is possible that a statement contains a rounded number instead of the precise one (e.g., " $1.2 million " and " $1212000 "). 4. Incompleteness. In most cases an information source does not contain full information about the real-world item it describes. Some properties may be missing from the description. For instance, an entity such as a country has many properties (economic, geographic, etc.). It is unlikely that all source documents will explicitly mention all of them. 5. Vagueness. Sometimes a predicate of a statement can be …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fusing automatically extracted annotations for the Semantic Web

This research focuses on the problem of semantic data fusion. Although various solutions have been developed in in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be re...

متن کامل

UNT-Yahoo: SuperSenseLearner: Combining SenseLearner with SuperSense and other Coarse Semantic Features

We describe the SUPERSENSELEARNER system that participated in the English allwords disambiguation task. The system relies on automatically-learned semantic models using collocational features coupled with features extracted from the annotations of coarse-grained semantic categories generated by an HMM tagger.

متن کامل

BIM: an open ontology for the annotation of biomedical images

Biomedical images published within the scientific literature play a central role in reporting and facilitating life science discoveries. Existing ontologies and vocabularies describing biomedical imag-­‐ es, particularly sequence images, do not provide sufficient seman-­‐ tic representation ...

متن کامل

Instance-Driven Attachment of Semantic Annotations over Conceptual Hierarchies

Whether automatically extracted or human generated, open-domain factual knowledge is often available in the form of semantic annotations (e.g., composed-by) that take one or more specific instances (e.g., rhapsody in blue, george gershwin) as their arguments. This paper introduces a method for converting flat sets of instance-level annotations into hierarchically organized, concept-level annota...

متن کامل

AASA: a Method of Automatically Acquiring Semantic Annotations

An important precondition for the success of the Semantic Web is founded on the principle that the content of web pages will be semantically annotated. This paper proposes a method of automatically acquiring semantic annotations (AASA). In the AASA method, we employ a combination of data mining and optimization to acquire semantic annotations. Key features of AASA include combining association ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006